intermediate answer
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs
Wu, Dingjun, Yan, Yukun, Liu, Zhenghao, Liu, Zhiyuan, Sun, Maosong
Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing RAG methods either rely solely on text corpora and neglect structural knowledge, or build ad-hoc knowledge graphs (KGs) at high cost and low reliability. To address these issues, we propose KG-Infused RAG, a framework that incorporates pre-existing large-scale KGs into RAG and applies spreading activation to enhance both retrieval and generation. KG-Infused RAG directly performs spreading activation over external KGs to retrieve relevant structured knowledge, which is then used to expand queries and integrated with corpus passages, enabling interpretable and semantically grounded multi-source retrieval. We further improve KG-Infused RAG through preference learning on sampled key stages of the pipeline. Experiments on five QA benchmarks show that KG-Infused RAG consistently outperforms vanilla RAG (by 3.9% to 17.8%). Compared with KG-based approaches such as GraphRAG and LightRAG, our method obtains structured knowledge at lower cost while achieving superior performance. Additionally, integrating KG-Infused RAG with Self-RAG and DeepNote yields further gains, demonstrating its effectiveness and versatility as a plug-and-play enhancement module for corpus-based RAG methods.
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (8 more...)
- Research Report > New Finding (0.67)
- Personal > Obituary (0.46)
- Leisure & Entertainment (0.94)
- Media > Film (0.46)
- Government > Military (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models
Lee, Kyumin, Jeon, Minjin, Jang, Sanghwan, Yu, Hwanjo
Answering complex real-world questions requires step-by-step retrieval and integration of relevant information to generate well-grounded responses. However, existing knowledge distillation methods overlook the need for different reasoning abilities at different steps, hindering transfer in multi-step retrieval-augmented frameworks. To address this, we propose Stepwise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models (StepER). StepER employs step-wise supervision to align with evolving information and reasoning demands across stages. Additionally, it incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps. Our method is adaptable to various multi-step retrieval-augmented language models, including those that use retrieval queries for reasoning paths or decomposed questions. Extensive experiments show that StepER outperforms prior methods on multi-hop QA benchmarks, with an 8B model achieving performance comparable to a 70B teacher model.
- North America > United States > Georgia > Chatham County > Savannah (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.14)
- Oceania > Australia > Queensland (0.04)
- (16 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Media > Music (0.93)
- Health & Medicine (0.68)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation
Wang, Ziqi, Huang, Hongshuo, Zhao, Hancheng, Xu, Changwen, Zhu, Shang, Janssen, Jan, Viswanathan, Venkatasubramanian
Materials discovery relies on high-throughput, high-fidelity simulation techniques such as Density Functional Theory (DFT), which require years of training, extensive parameter fine-tuning and systematic error handling. To address these challenges, we introduce the DFT-based Research Engine for Agentic Materials Screening (DREAMS), a hierarchical, multi-agent framework for DFT simulation that combines a central Large Language Model (LLM) planner agent with domain-specific LLM agents for atomistic structure generation, systematic DFT convergence testing, High-Performance Computing (HPC) scheduling, and error handling. In addition, a shared canvas helps the LLM agents to structure their discussions, preserve context and prevent hallucination. We validate DREAMS capabilities on the Sol27LC lattice-constant benchmark, achieving average errors below 1\% compared to the results of human DFT experts. Furthermore, we apply DREAMS to the long-standing CO/Pt(111) adsorption puzzle, demonstrating its long-term and complex problem-solving capabilities. The framework again reproduces expert-level literature adsorption-energy differences. Finally, DREAMS is employed to quantify functional-driven uncertainties with Bayesian ensemble sampling, confirming the Face Centered Cubic (FCC)-site preference at the Generalized Gradient Approximation (GGA) DFT level. In conclusion, DREAMS approaches L3-level automation - autonomous exploration of a defined design space - and significantly reduces the reliance on human expertise and intervention, offering a scalable path toward democratized, high-throughput, high-fidelity computational materials discovery.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Energy (0.92)
- Information Technology (0.68)
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Zhang, Anqi, Chen, Yulin, Pan, Jane, Zhao, Chen, Panda, Aurojit, Li, Jinyang, He, He
Reasoning models have achieved remarkable performance on tasks like math and logical reasoning thanks to their ability to search during reasoning. However, they still suffer from overthinking, often performing unnecessary reasoning steps even after reaching the correct answer. This raises the question: can models evaluate the correctness of their intermediate answers during reasoning? In this work, we study whether reasoning models encode information about answer correctness through probing the model's hidden states. The resulting probe can verify intermediate answers with high accuracy and produces highly calibrated scores. Additionally, we find models' hidden states encode correctness of future answers, enabling early prediction of the correctness before the intermediate answer is fully formulated. We then use the probe as a verifier to decide whether to exit reasoning at intermediate answers during inference, reducing the number of inference tokens by 24\% without compromising performance. These findings confirm that reasoning models do encode a notion of correctness yet fail to exploit it, revealing substantial untapped potential to enhance their efficiency.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
Zhuang, Ziyuan, Zhang, Zhiyang, Cheng, Sitao, Yang, Fangkai, Liu, Jia, Huang, Shujian, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi
Retrieval-augmented generation (RAG) methods encounter difficulties when addressing complex questions like multi-hop queries. While iterative retrieval methods improve performance by gathering additional information, current approaches often rely on multiple calls of large language models (LLMs). In this paper, we introduce EfficientRAG, an efficient retriever for multi-hop question answering. EfficientRAG iteratively generates new queries without the need for LLM calls at each iteration and filters out irrelevant information. Experimental results demonstrate that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.
- Asia > Singapore (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > Texas > Montgomery County > Conroe (0.04)
- (12 more...)
- Leisure & Entertainment (1.00)
- Media > Film (0.46)
Chain of Thoughtlessness? An Analysis of CoT in Planning
Stechly, Kaya, Valmeekam, Karthik, Kambhampati, Subbarao
Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought prompting-a method of demonstrating solution procedures-with the intuition that it is possible to in-context teach an LLM an algorithm for solving the problem. This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examines the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. While our problems are very simple, we only find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class, and that those improvements quickly deteriorate as the size n of the query-specified stack grows past the size of stacks shown in the examples. We also create scalable variants of three domains commonly studied in previous CoT papers and demonstrate the existence of similar failure modes. Our results hint that, contrary to previous claims in the literature, CoT's performance improvements do not stem from the model learning general algorithmic procedures via demonstrations but depend on carefully engineering highly problem specific prompts. This spotlights drawbacks of chain of thought, especially the sharp tradeoff between possible performance gains and the amount of human labor necessary to generate examples with correct reasoning traces.
Deep Natural Language Feature Learning for Interpretable Prediction
Urrutia, Felipe, Buc, Cristian, Barriere, Valentin
We propose a general method to break down a main complex task into a set of intermediary easier sub-tasks, which are formulated in natural language as binary questions related to the final target task. Our method allows for representing each example by a vector consisting of the answers to these questions. We call this representation Natural Language Learned Features (NLLF). NLLF is generated by a small transformer language model (e.g., BERT) that has been trained in a Natural Language Inference (NLI) fashion, using weak labels automatically obtained from a Large Language Model (LLM). We show that the LLM normally struggles for the main task using in-context learning, but can handle these easiest subtasks and produce useful weak labels to train a BERT. The NLI-like training of the BERT allows for tackling zero-shot inference with any binary question, and not necessarily the ones seen during the training. We show that this NLLF vector not only helps to reach better performances by enhancing any classifier, but that it can be used as input of an easy-to-interpret machine learning model like a decision tree. This decision tree is interpretable but also reaches high performances, surpassing those of a pre-trained transformer in some cases.We have successfully applied this method to two completely different tasks: detecting incoherence in students' answers to open-ended mathematics exam questions, and screening abstracts for a systematic literature review of scientific papers on climate change and agroecology.
- Africa > Sub-Saharan Africa (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Overview (0.89)
- Food & Agriculture > Agriculture (1.00)
- Education (1.00)
- Energy (0.94)
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Shao, Zhihong, Gong, Yeyun, Shen, Yelong, Huang, Minlie, Duan, Nan, Chen, Weizhu
Large language models are powerful text processors and reasoners, but are still subject to limitations including outdated knowledge and hallucinations, which necessitates connecting them to the world. Retrieval-augmented large language models have raised extensive attention for grounding model generation on external knowledge. However, retrievers struggle to capture relevance, especially for queries with complex information needs. Recent work has proposed to improve relevance modeling by having large language models actively involved in retrieval, i.e., to improve retrieval with generation. In this paper, we show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner. A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge which in turn helps generate a better output in the next iteration. Compared with recent work which interleaves retrieval with generation when producing an output, Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints. We evaluate Iter-RetGen on multi-hop question answering, fact verification, and commonsense reasoning, and show that it can flexibly leverage parametric knowledge and non-parametric knowledge, and is superior to or competitive with state-of-the-art retrieval-augmented baselines while causing fewer overheads of retrieval and generation. We can further improve performance via generation-augmented retrieval adaptation.
- Government > Regional Government > North America Government > United States Government (1.00)
- Leisure & Entertainment > Sports (0.94)
- Media (0.69)